perm filename CACHE.7[AM,DBL] blob sn#427717 filedate 1979-03-25 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00009 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	\input paper.tex
C00005 00003	\NSECP{INTRODUCTION}
C00016 00004	\NSECP{THE  ASSUMPTIONS}
C00033 00005	\NSECP{DETAILS OF THE SOLUTION}
C00047 00006	\SSEC{CACHING:  INTELLIGENT REDUNDANCY}
C00063 00007	\SSEC{EXPECTATION-SIMPLIFIED PROCESSING: INTELLIGENT FOCUS OF ATTENTION}
C00070 00008	\NSECP{COGNITIVE ECONOMY REVISITED}
C00072 00009	\NNSECP{Acknowledgements}
C00073 ENDMK
C⊗;
\input paper.tex

\TITL{COGNITIVE  ECONOMY}

\vfill

\AUTHO{Douglas B. Lenat, \ Stanford University}

\AUTHO{Frederick Hayes-Roth, \ Rand Corporation}

\AUTHO{Phillip Klahr, \ Rand Corporation}

\vfill

\vskip 10pt plus 2pt

\NNSECP{ABSTRACT}
       Intelligent systems can explore only tiny subsets of their potential
       external and conceptual worlds.   They must develop efficient  forms
       of  representation,  access,   and  operation   to  increase   their
       capacities.  Some of these  forms involve abstraction, caching,  and
       expectation-simplified processing.  These capabilities in turn  can
       combine to provide extremely powerful increases in performance.  For
       example, all three can combine to simplify simulation or, one of its
       related functions, detection of surprising events.  In developing
       economic principles for modern  AI systems, we are forced to favor
       some counterintuitive ideas, policies which are not generally
       followed because their contribution to overall cognitive utility is
       not apparent.  For example, the  nonredundant  storage of properties
       in hierarchical
       inheritance nets  increases many  processing costs  while  providing
       minimal storage cost  savings.

\vfill

\eject

\NSECP{INTRODUCTION}

When we build an AI program, we often find ourselves caught in the following
tradeoff: On the one hand, we want to develop our initial systems quickly
and painlessly, since they are experimental and ephemeral. On the other hand,
we want our systems to run as efficiently as possible, to minimize the
temporal delays, to reduce the magnitude of the cpu time and space required.  
We are torn between
{/it expressability} and {/it efficiency}.


Many AI  researchers {/sl  cum} language  designers have  focused on  {\it 
expressability},  the   problem   of  rapidly   constructing   a   working
experimental vehicle.\foo{Consider the goals of {\:c SAFE} [Balzer 77], 
{\:c PSI} [Green 77], {\:c EMYCIN} [ref], {\:c RITA} [or Rosie?], and 
{\:c KRL} [Winograd & Bobrow 77].}
They have produced some excellent software such  as
Interlisp's {\:c DWIM}, File, and Break  Packages.\foo{[Bobrow & Raphael 73]  is
still an excellent  discussion of  the issues  involved in  such ``AI  language"
efforts.}  
Three fundamental  techniques
utilized in highly {\it expressive} programs are:
({\it i}) reliance upon a very high level language, 
({\it ii}) planning at multiple levels of abstraction, and
({\it iii}) inference by pattern-directed knowledge sources.


This paper is attempting to draw attention to the second goal, 
{\it efficiency}.
We present techniques which do not sacrifice expressability, yet enable
programs to (semi-)automatically improve themselves, increase their productivity.
The basic source of power is the ability to predict the way that the program will
be used in the future, and to tailor the program to expedite such uses.

The traditional approach to program optimization has assumed that the predictions
are made by the programmer (e.g., by explicitly providing assertions) or can be
inferred statically by
({\it i}) Knuthian and Dijkstraining analysis of the program,
({\it ii}) carefully designing the program's data structures to be appropriate,
and ({\it iii}) compiling (as in the {\:c FORTRAN H} compiler; optimizing
transformations of the program [Burstall & Darlington 73], etc.)

We advocate the use of methods more {\it dynamic} than those.
One valuable source of prediction comes from the program's experiences as it is run.
If a program can ``learn" from its experiences, it might try applying each piece of
newly acquired knowledge to the task of specializing itself, modifying itself
to run more efficiently in the current runtime environment.
We believe that a program's ``intelligence" can be increased in this way;
that is, by increasing its ability to
acquire  appropriate knowledge, to organize that knowledge, and to refine
the conditions under which that knowledge is recognized to be applicable.

Some earlier automatic programming systems [Darlington & Burstall, Lenat's PUP6,
Jim Low, Kant & Barstow,
others?] were designed to improve programs' efficiency, and many of their
ideas have found their way into the techniques we now describe.  
Those systems 
had program construction, transformation, and optimization as their primary
task; we are proposing general methods to provide 
self-improving abilities
to other kinds of AI programs
(speech understanders, theory formers, expert simulators, paraphrasers,...)




Three of the abilities which make a program {\it efficient} are:


\noindent $\bullet $ 
{\bf Dynamic self-monitoring} (the ability  to sense, record, and  analyze
dynamic usage)  {\bf  and  self-modification} (the  ability  to  use  that
knowledge   to   redesign/recompile    itself   with   more    appropriate
representations, algorithms, data structures; i.e., intelligent learning.)

\noindent $\bullet $ 
{\bf Caching of computed results}
(storing the results of frequently-requested
searches, so they needn't be repeated over and over again; i.e.,
intelligent redundancy.)

\noindent $\bullet $ 
{\bf Expectation-filtering}
(using predictions to filter away expected, unsurprising data,
thereby freeing up processing time for more difficult subtasks;
i.e., intelligent focus of attention.)

\yskip

{\it Cognitive economy} is the term  by which we describe such  heightened
productivity. 
Computer programs, no less than biological creatures, must perform 
in an environment: an externally imposed set of demands, pressures,
opportunities, regularities.\foo{For a similar recognition of the
importance of the ``task
environment'', see [ref to Newell & Simon's HPS].}
Extending this analogy, we find that
cognitive economy is the degree to which a program is adapted to its
environment, the extent to which
it has found its environmental niche.

\hbox par 2.9in{    % this makes a space for the little figure; see below.
Notice that representing a corpus of knowledge as a  minimal
(canonical)      generalization/specialization       hierarchy,       with
interpretatively-computed inheritance, is {\it  not} in this category:  it
favors expressability,  compaction of  representation, at  the expense  of
performance. It is  true that  a horse  is a mammal,  and a  mammal is  an
animal, and from those two we could compute that a horse is an animal (see
Fig. n), but it is more cognitively economical to store one redundant  arc
than to recompute it frequently.   Psychological studies [ref] indicate
just such redundancies being created and strengthened by repetitions.
}

[[in space above, insert DIAGRAM OF HORSE/MAMMAL/ANIMAL, 
WITH REDUNDANT ARC AS A DOTTED ARROW]]

[[more about what cog. econ is -- and is not -- should go here,probably.]]


Every scientific theory is  constructed in a  rich context of  surrounding
theories, methods,  and standards  which determine  which experiments  are
reasonable ones to perform and which point of view to take when  interpreting
the results  -- in short, a {/it paradigm}. We feel it useful
to articulate the "hard core" of our paradigm (the assumptions our  theory
rests upon) before  delving into more detail about cognitive economy.
For that reason, the entire next section is devoted to a presentation of our
assumptions, including:
a model for intelligence
(Section 2.1),  a  model for  how  intelligent computer  programs  may  be
organized (Section 2.2),  and a  model of the  characteristics of  present
computing engines (Section 2.3).
Section 3 assumes these models, and contains detailed discussions and
examples of the three main components of cognitive economy: 
dynamic self-monitoring and self-modification (Section 3.1),
caching of computed results (Section 3.2),
and expectation-filtering (Section 3.3).
One interesting result that emerges is the prediction that as task size
grows large enough, the techniques which currently are used for expressiveness
(e.g., multiple levels of abstraction) will become important for
cognitive economy -- for efficiency -- as well.

\NSECP{THE  ASSUMPTIONS}


The foundation upon  which our  theories are erected  consists of  several
kinds  of  assumptions:

(i)  We  continually  face  searches  in  more  or  less  immense  spaces;
intelligence is the ability to bring {/it appropriate} knowledge to  bear,
to speed  up  such  searching.   Increasing  intelligence  then  comprises
increasing  knowledge,  its  organization,  and  the  conditions  for  its
applicability.  How are  these new discoveries  made? Empirical  induction
(generalizing from experimental and other observations), analogy, and  the
criticism  of  existing   theory  all   lead  to   new  conjectures,   new
possibilities.

(ii) Intelligent systems  can make  the applicability  of their  knowledge
explicit by representing that knowledge as condition/action rules (If {/sl
situation}  then  {/sl   appropriate  reaction}).  Such   pattern-directed
inference systems (PDIS) can benefit from a schema representation  (frame,
unit, being, script, etc.),  because this adds  structure which the  rules
can then take advantage of.

(iii) Computers currently present us  with limited cycles, cheap  storage,
and expensive knowledge acquisition.

While the  reader may  wish to  turn immediately  to a  discussion of  our
detailed cognitive economy ideas (Section 3), it is useful to examine  the
above assumptions in more detail first.


\SSEC{Our Model of Intelligence}


Many human cognitive activities, including most of those commonly referred
to as ``requiring intelligence", can be cast as searches, as  explorations
through  a  search  space,  meandering  toward  some  goal.   We  call   a
problem-solver more  intelligent  if  he can  efficiently  work  toward  a
solution even in the  face of an immense  search space and an  ill-defined
goal.  Somehow, he is  imposing more structure on  the problem, and  using
that to search more effectively. How might he do this?

According to our model of human intelligence, he accomplishes his task  by
bringing knowledge  to  bear, knowledge  not  supplied explicitly  in  the
problem statement.  This knowledge can be about problem-solving in general
(e.g., how to recognize and break cultural blocks) or about the  problem's
domain specifically (e.g., which groups of atoms can usually be treated as
superatoms).  It may have been learned  long ago, or generated during  the
early phase of work on the problem.

This implies  that a  problem  solver can  become  more effective  by  (i)
increasing his knowledge, (ii) better organizing his knowledge, and  (iii)
refining his criteria for  deciding when various  pieces of knowledge  are
applicable.  In terms of production  (If/Then) rules, these correspond  to
adding new rules, modifying  the data structure in  which rules are  held,
and modifying the conditions (IF parts) of existing rules.

How is new knowledge discovered? One  route is that of {/it  abstraction}:
condensing many experiences into a  heuristic which, in hindsight, we  see
would have greatly aided us in the  past, which would have speeded up  our
reaching  this   state   in  the   search.    Closely  related   is   {/it
generalization}, often  merely  expanding the  domain  of relevance  of  a
specific bit of knowledge  we already possessed. {/it  Analogy} is one  of
the most useful  techniques; one can  draw parallels not  merely to  other
existing facts and heuristics,  but also to the  {/sl paths} which led  to
their discovery (e.g., even if a result in organic chemistry does not  map
over into the inorganic world, the experiment which led you to that  first
fact may map over into an experiment which {/it will} reveal a useful fact
in inorganic chemistry; even though the analogue of a mathematicl  theorem
may be false  in some  new domain, the  analogous proof  technique may  be
useful). Another path to discovery is {/it specialization} of more general
knowledge. Finally,  we  must  mention the  process  of  {/it  hypothesis,
criticism, and improved hypothesis}, which is a common and powerful method
of spiralling  in toward  precision.  In  mathematics [Lakatos]  and  many
scientific disciplines [Musgrave  & Lakatos],  counterexamples to  current
theories  arise  frequently,  and  their  incorporation  often  demands  a
deepening of the theory, an increase in knowledge.

Experiments such as the AM program [ref] support our assertion that  these
methods can adequately guide even open-ended searches for new  definitions
and conjectures.  But how can an intelligent system be programmed, how can
such systems be organized?


\SSEC{Our Model of Intelligent Program Organization}

The above  remarks  about intelligent  problem  solving apply  equally  to
hardware and wetware alike. Computer programs which are to be  intelligent
must  ultimately  grapple  with   the  tasks  of  knowledge   acquisition,
representation, and refinement. We cannot provide an abolute answer to how
they should be organized, but we  can posit some design constraints  which
have proven useful so far.

A very  general heuristic  in AI  programming is  the following:  If  your
program is  going  to  modify its  own  {\bf  X}, then  make  {\bf  X}  as
separable, clean, and explicit  as possible. In our  case, {\bf X} can  be
instantiated as  "knowledge", or  as  "applicability conditions  for  each
piece of  knowledge".  Thus the  heuristic  advises us  to  represent  our
knowledge in a separate,  clean, explicit form,  say as knowledge  modules
having some fixed structure, and also advises us to keep the applicability
conditions for  each  knowledge  module  separate from  the  rest  of  the
knowledge it contains.

This naturally leads us to  a pattern-directed inference system, in  which
knowledge is broken into  separate modules, each with  an explicit set  of
relevancy tests.  Such systems  arising  in Pittsburgh  may  overemphasize
cleanliness (so-called pure  production systems), while  those arising  in
California may tend to  be a bit  too laid back  (so-called ad hoc  expert
systems), but such variations  should not obscure  their common source  of
power.  The dominant PDIS architecture has knowledge broken into a set  of
condition/action  productions,  rules  of  the  form  IF  {/sl  triggering
situation} THEN {/sl appropriate reaction}.

Having a clean  representation for  rules means having  a clean,  precise,
elegant language in which to express them.  By structuring the  conceptual
knowledge of  the system,  by partitioning  each module's  knowledge  into
several categories, a rule can  condense an entire cumbersome  description
into  a  single  pointer.   The  popular  schematized  representations  of
knowledge (scripts for episodes, frames for static situations, beings  for
specialists, units for  everything) enable a  rule like "If  there are  no
known methods specific to finding new examples of prime numbers,  Then..."
to have its condition coded as a simple null-test on the To-get subslot of
the Examples slot of the schema for Prime Numbers.  By a judicious  choice
of slot  types and  subslot  types, the  system  builder can  reduce  most
triggering conditions  to  such  quick  checks on  the  state  of  various
(subslots of) slots of schemata.

Additional knowledge is required to efficiently locate all the rules which
{/it might} have their conditions satisfied in a given situation, and also
to decide which rules to actually  fire (obey) among those whose IF  parts
have actually triggered (been satisfied).

The location of  potentially-relevant rules is  facilitated by  organizing
the  rules   into  some   structure.   For   example,  AM[ref]   organized
mathematical concepts into a generalization/specialization hierarchy,  and
tied each rule  to its most  relevant concept. To  find rules  potentially
relevant to concept C,  AM then needed  only to look  at the rules  tacked
onto C, C's generalizations,  {\it their} generalizations,  and so on.  By
inheritability, any  rule  relevant  to judging  Objects  in  general  was
(potentially) relevant  to judging  an Abelian  group; any  technique  for
estimating the value of any function was relevant to estimating the  value
of Ackerman's function.

This requires an explicit focusing of attention, a selection of a "current
topic" C from which the above rippling proceeds. We favor the  maintenance
of a separate  data structure  for this purpose,  an agenda  of topics  to
consider, of subtasks  to work  on. Knowledge may  be brought  to bear  to
select a topic,  then the above  quest for potentially  relevant rules  is
carried out, then they are examined  to find the truly relevant  satisfied
set, and finally some subset of those are fired off.



\SSEC{Our Model of (Present) Computers}


Since we  are  considering the  problem  of building  computer  models  of
intelligent  behavior,   many   of   our   assumptions   deal   with   the
characteristics of present-day computers. They are the symbol manipulators
we use, but were not designed for that general purpose.

[RICK:  FILL THIS SECTION OUT; 
	2 BEAUTIFUL PARAGRAPHS WILL DO IT. BASIC IDEA WAS:
	Computers currently present us with limited cycles, cheap storage,
	uniprocessing, and expensive knowledge acquisition.]


\NSECP{DETAILS OF THE SOLUTION}


\SSEC{SELF-MONITORING & MODIFICATION: INTELLIGENT LEARNING}

  \SSSEC{DYNAMICALLY MONITORING ITS TASK ENVIRONMENT}

  \SSSEC{MODIFYING ITS DATA STRUCTURES AND REPRESENTATIONS}

  \SSSEC{MODIFYING THE CONTROL STRUCTURES AND ALGORITHMS}

  \SSSEC{FINDING THE PROPER LEVEL OF ABSTRACTION}

A train accident occurs, and over two hundred people are injured; the
only large nearby hospital is called, and asked if they can handle the load.
The administrator queries his shiny new simulator, and it begins to chug
away.  It looks over the state of each current patient, each physician,
each hospital resource.  It begins simulating the arrival of the wounded,
their movement and temporary storage in hallways, etc.  After quite a while,
it shows that the hospital will be saturated and unable to care for any more
incoming patients.  The maximimum they can accept is 157 passengers.
If "quite a while" is several hours (which it very likely will be), this is
simply inappropriate. There is a pressing need for a rough answer quickly;
a {\it human} hospital administrator would answer right away, or after a
very brief survey of a few parameters, and our belief is that {\it so should
an intelligent program}.

A colonel monitoring our defense systems detects a fleet of enemy aircraft
attacking in an unexpected pattern. Can our bombers and missles meet this
new attack mode?  The colonel ill know precisely how much time is available
to answer that question, and it will be a matter of seconds or minutes, not
of hours or weeks.  A detailed simulation program will be of no use to him.
Reasoning will go on at a high level of abstraction, and an approximate answer
will be forthcoming (either from a program capable of such inference, or
-- in lieu of that -- in the colonel's head).


[what are some other possible examples to use here?  We should pick say two,
and use the same ones throughout the paper, whenever possible.]




In many real-life problem solving situations, the "goal" is not a single,
precise answer. Rather, there is a cost/benefit tradeoff between the
accuracy of the solution and the amount of resources consumed in producing
it.  As with any such tradeoff, there is typically a point of diminishing
returns.  Computer programs which are designed only to return exact answers
will then be ill-suited to the needs of the various situations in which it
will be called upon.  One area in which this is recognized in AI is game-
playing.  The concept of truncated search is fundamental in all chess programs
Yet very little of the same philosophy has permeated to other kinds of
AI programs.

[this last para. needs reworking; my analogy to truncated search may confuse
more than illuminate.]


But this is very strange: we earlier said that reasoning at multiple levels of
abstraction was a technique for heightening expressiveness, not efficiency.
Is this such a powerful ability that it helps improve both simultaneously,
or is something else going on?  The answer lies, significantly, in the
nature of the task environment.  Multiple levels of abstraction cost a certain
amount to implement, both initially (man-hours of programming) and as the
program runs (cpu time spent maintaining the redundancy, choosing which level
to work on next, etc.) For problems of all sizes, this feature aids in the
expression of new problems to the program, since they may be input at whatever
level(s) seem appropriate to the user. As the graph below shows, then,
MLA will always add to expressiveness, and will either detract or add to
efficiency depending upon how small or large the task is.

There is a more general result here: the methods which humans find useful for
easy expression of problems may be useful to programs attacking very large
problems.  For instance, consider the decision about whether to do inference
by a system of pattern-invoked experts, or rather to preprocess the triggering
conditions into a big discrimination network. The former approach is more
expressive, the latter more efficient -- at least for small problems. For
very large tasks, the need to remain flexible grows very large. It is then
more economical to retain pattern-directed control (the compilation of the
patterns is so complex the whole procedure would have to be redone
frequently during the course of execution, if we opted for that alternative
instead.)



see   ( Desirability of being able to compute rough answers cheaply
above (         Simulation

	Conceptual hierarchies
		Hier. in general: possible organizing relations
			Supervises, commands, controls, calls (as in subroutin
			Containment: part-of versus specialization-of versus
					application-of versus element-of

		Inheritability: the importance of "or any ancestor"
			The notion that the hier. is a compact way of storing
				a huge set of assertions, is fairly efficient
				vis a vis retrieving / testing for them, and
				is much much more efficient for updating them.
			This discussion should be general to all above
				organizing relns, not just genl/spec hiers.
		Make sure to distinguish our meaning for abstraction
			(redundant, generalized version) from the other
			common meaning (inducing a general case from examples)

	Heuristic Hierarchies
		Each heuristic H has a domain of relevance
		More precisely, H's relevance is a function of task posed,
			and is high for some (it domain of relevance),
			low for many more, and perhaps negative for others.
		Heurisitcs can be related to each other by genl/specialization
			[example: "If P(x) is often false, weaken P" -->
				Also an example involving sim.]
		Heuristics can be object-centered (tacked onto relevant
			concepts) and can therefore inherit THEIR relationship
			[an ex. of that..  This is "Inheritability" in AM
			sense.]



	Interpretation and planning at levels of abstraction
		Eg., rules of bomber simulation at difft levels
		(this example will ultimately be used to suggest
		 caching for simplification)
		[this ties in with the last point about heur. hier.]
		Part of this para. might be this example:

"In the short run, what would happen if we didn't take any defensive
actions in a full-scale enemy attack situation?"  A strategist can
answer this instantly, to the effect that the enemy missles and bombs
would then reach their targets, destruction would be nearly total, our
capacity for retaliation would be severely reduced, and so on. He can
say this without factoring in the individual changes in aileron
attitutde throughout the flight of each bomber.
He has a high-level "script" which specifies defensive action against an
attack, with a small note of explanation (e.g., "else destruction"), and
that's as detailed as he needs to get to answer the question.
Now consider a question like "What if we had a fleet of 200 B1 bombers
in addition to our current forces?" The strategist may again be capable
of giving a quick estimate of the effect, but it's possible that a more
detailed analysis would reveal some cusp, some nonlinearity not apparent
to him through simple extrapolation, through the application of general,
high-level rules of thumb.  One possibility is that his old, general,
high-level scenario could be used as a plan, as a set of general expectations
which guide the detailed simulator, which predict which aspects of the
world will benefit from much more detailed simulation (e.g., encounters where
our bombers hitherto failed to reach their targets), and which areas will
probably be unaffected by the change in the world (e.g., attacks on our
coast by enemy submarines).
[note that we can easily extend this discussin a bit for caching: keep
around the old simulation, at various levels, and only re-run those parts
which might be affected by the change in initial situation, and even then
only rerun it at the hihest level possible.  This latter decision is probably
always made locally: run until the results stop changing, then you know
you've expanded down one level too far, so you can back up and stop.


	Related research
		Include mention of psych. studies showing that people may
			reason at the word level, rather than more primitive
			level, because the high-level word manipulation
			process is more efficient than the manipulation of
			large, detailed structures (such as huge sem. nets).

\SSEC{CACHING:  INTELLIGENT REDUNDANCY}


	Modifying memory to save computed results to speed subsequent accesses
		Simpleset case: After computing F(x), remember it.
		Also important: After computing F(x), remember the PATH
			i.e., the way you computed it, the traps, the blind
			alleys, the overall type of successful approach, etc.
		Analogue in chess: first hinted at by Newell Shaw and Simon
			in their early chess paper, then elaborated by Berline
			in CAPS: the idea that you should abstract out a
			description of a path through the search tree, so
			that the remainder of the seaarch could partake of it.

	Generalization of hardware concept
		What more to say than the name, the hardware defn and uses,
		and the obvious analogy to our above proposal.
		Perhaps here is a good place for some rough calculations
			which show the extra cost and benefits in some sits.
			In which case, use the same few situations all through
			the paper as running examples.

	EURISKO types of caching, as first examples
		This means a diversion into representation of concepts as
		slots and subslots (facets and subfacets).
		Two types of slots: elementary and virtual.
		Blend them together: a cached virtual slot appears much
			like an elementary primitive slot.
		Note this is most applicable when the values are slowly-
			changing.  This is ideal for the defns. of slots.
			This enables the program to occasionally change a slot
			's definition and yet (except right after such a
			change) run as efficiently as if that capability
			were not there.
		Analogous to COMPILING.
			Both of functions and of definitions.
		Make sure it's clear that caching of defns. is not in any
			way a special feature:  any facet can be cached.
		Options involved:
			The basic idea is that a request comes in with
			some description of the cost/benefit curve
			(for each amount of resources expended and each
			degree of precision of the result, specify how
			acceptable that resource consumption / accuracy of
			solution pair is.)  One extreme of this is to provide
			an ionclad limit on resources (Boborow's resource-
			limited comoutation), and say "Do the best you can
			within these time/space/... limits."  Another extreme
			(most common in present-day programs) is to specify
			an ironclad goal criterian ("An answer which is at
			least this good, no matter how long it takes to get.")
			We are making the reader aware of the space in between
			the two extremes.
			Now that we went up to lofty heights, come back down.
			In Eurisko, we linearized this space.  We picked a
			few telling parameters of it, and said that a descrip
			was merely a list of values for these parameters.
				Time, space, user communic, degree of precis.
			Go through a couple examples in detail.

	Contrast with psychological conjectures of cognitive economy
		(e.g., Collins&Quillian, KRL, ...).  More like $HR↑2$ Plasticity
		model of storing all retrieved paths as direct links
		[I can gues what this means, but you (Rick?) should do this
		section.]

	General principles
	   Note that this is in general a heuristic rule driven decision-
		making process, and what we have specified below are just
		some of the relevant heuristics. Many of the more general,
		comonsense ones are absent, and even the ones present are
		not specified all the way down to the operational (LISP) level
		If a progam had these rules, the categories below would be
		the various kind of subslots allowable (differentiable).

	   Note that there must exist heuristics for determining the
		applicability of the following, as a function of machine
		architecture. Equivlaently: we haven't fully worked out
		the left hand sides (domains of relevance) of the rules.




	    Updating Principles
	    -------------------

	       When
		If the curr. request's description specifies that a more
			precise answer is needed than the one cached.
		Only if the resource alloc. is high enough to get a better
			or more current answer than the one cached already.
		Both of these mean: If you've gotten a better value to cache.

		Of ourse, if there is no cached answer available, update=write

		Typically: the cached value was F(x), and x changed,
			or the definition of F changed.


	       Why

		Frame problem:  the value is old, and the world (may've) chang
		A new value is computed (either intentionlly or accidentally)
			and its current-ness makes it "better" perhaps.

	       How
		  get demon traps that flag the cache as out of date
		  the user requests updating if the cache seems staleness

		Usually: discard the whole old value, store just the new one.
		If the program is very smart: When the world changes, try to
			propogate that change through the network of caches,
			changing THEM. This is much better than erasing them!
			Give an example of this in action (prob. fictitious).



	       Where

	       In what form

	       What

	       When not to
		This should be merged with "When to"
	       How to
		How is this different from "How", above?






	    Storage Principles
	    ------------------

	       When
		  Every time you have to call a lower order function to eval. it
		    & it took quite a while.
		  You've caled it before, recently & the value didn't change.
		If the amount of processing to get it was very high.
			Esp: after a lucky accident, where a repeat might
			be very costly or even fail entirely to duplicate
			your recent success at getting a solution.
		If (freq of usage)x(avg cost to recompute)/(freq of change)
			is high, then it pays off.




	       Why
		  Cost of recomputing vs. cost of storage.
		  Context of subsequent cals is similar enough (e.g.l, the same
		    arguments will come up again.
		Related to "When"; namely: in those situations, it's definitel
			cost-effective to store a cache.
		Analogy to hardware utility.

	       How
		  Called functions might suggest how to cache their value in higher
		    calling caches (e.g., my value changes often so cache my defn.).
		  Cache should be transparent & discardable (should be able to throw
		    them all away if space needed).
		Make sure to store a description of the CALL which led to
			this value being computed.  I.e., make sure you
			store the amount of resources brought to bear, the
			time (to tell freshness later on) of day, whether
			any other caches were used in this search, etc.
		If possible, predict (criteria, time-limit, etc.) under
			which this will no longer be current (eg, depends
			on defn of Y, depends on there being no exs. of Z...)

	       Where
		The value to be cached should hav a precise storage place.
		One extreme is an assertional database, with NO structure.
		WE PREFER TO ADD slot and subslot structuring, with some
		domain-specific tinkering of those two sets (esp the former).
		Thus an extreme example of prime numbers should be stored
		on a slot called extreme-examples, on a schema called primes.
		At access time, we look in the precise spot(s) where X would
		be, and if it isn't there, we compute and cache it right there

	       In what form
		  value     )  what level of abstraction (partially evaluated
		  expression)    symbolic expression)

		  Stack previous values to enable you to tell if they're changing.

	       What
		  You store a flag saying you've been here before.
		  When it was computed.
		  How much effort was expended on it, down to what levels of
		    algorithms, with what around caches incorporated.
		  Certainty of the result.

	       When not to
		  The value changes too frequently.
		The value is so critical that the cost-benefit criteria
			(eg the resource limits are enormous) always favor
			recomputation rather than chancing a blunder.
		  The function evaluates as fast as the caching mechanism itself
		  Space is too tight
		In general, these are all related to When (above).

	       How to eliminate caches
		  Space tight--> eliminate last used caches (last referenced)
		If freq. of use is negligible
		Just reverse the formula in "When" (above), to see if it
			is no longer cost-effective to keep the cache.
		Remeber that a value will occupy varying amts. of space.
			AM, eg, eliminated large cached values first.



	It's also a good place, here, to present the progression of caching:
		This shows how "heuristics" are caches of experience, too.


ACCESS==> CALCULATE ==> DEDUCE ==> INDUCE BY OPEN=ENDED SEARCH

   \       /      \       /  \      /

     cache          thm, alg   heur rule


[eg, a theorem or algorithm is one way to simply a deduction, to reduce
it to mere calculation.  Heuristics are capable of reducing discovery tasks
to mere symbolic manipultion, to mere deductive computation (as in AM); and
caches are capbable of educing calculations to mere lookups]


MAYBE make some sense of this and put it in:
Save redundant genl/spec links; Save values of virtual slots; Save multiple
instantiations of the same rule at different levels of abstraction;
record consensations of many experiences as new heuristic rules.
[tradeoffs: size of KB;  updating problems; precision in reasoning one's
way through delicate inference chains effiiently; synthetic reasoning;
rule interaction]

Mention the psych. studies that show that people do develop direct association
between words that are co-retrieved often. Thus dog and animal are closer
than dog and mammal, even tho dog-->mammal--->animal in the hierarchy.




\SSEC{EXPECTATION-SIMPLIFIED PROCESSING: INTELLIGENT FOCUS OF ATTENTION}

	Central notion:  reserve your computing for opportunities to realize
		potential for expanding knowledge
		Ie: be willing and able to add new facts,
			but be eager to add surprising new facts.
		Equivalently: by being guided by expectations (scripts, frames
			you will find yourself zipping through 90% of what
			comes in from he world, and spending most of your
			processing time on the remaining puzzles, anomalies.
	You may decide how much to expend re-confirming the expected
		You may decide how much time to spend deciding how much time..
	Reductions realizable through expectations:
		Perceptual set:  see what you expect with less effort
		Surprise:  heighten sensitivity to data inconsistent with
			expectations
			Perhaps mention analogy to frog's visual system.
		Predict and prepare
			This includes a whole continuum, growing more
			flexible, more descriptive (less scalar), dynamic:
		  > Store a value, by binding a variable to it, so you can
			later refer to it (retrieve it) by name.
		  > Cache the value of a function-call, so if you ever want
			to find F(x), first you look to see if that value is
			already computed and stored away, else you compute it.
			Note that this is much like hashing, like associative
			memory: the idea that you have a place where F(x)
			should be (the hash of the string PARENTS(FRED), eg)
			and you look there first. This is just a genl. of
			the previous ">" about binding a variable to a value.
		   > After finding F(x), store the information that would
			enable you to recompute it rapidly.  Hopefully, this
			will also help you compute F(y) (either for all y
			or at least for some y's similar to x), and possibly
			will even help you compute G(x), where G is similar
			to F (and, much more rarely, G(y)).
			The idea here is to cache an algorithm, or at least
			constraints on a search, for finding F(x).
			One very special case of this is normal compiling:
			the compiled code for F is just a more efficient way
			of finding values of F. As with our above remarks, one
			often finds his code worked interpreted but not
			compiled. This is considered "wrong" (ie a bug in the
			compiler), but is much more in line with our general
			idea of mere assistance, which is sometimes quite
			useful but may sometimes not help you at all.
		   > Synthesize and store away a partial mode of the world.
			This may include a (sbu)network of frames, which
			together capture enough about the situation that
			in the future its relevance will be recognized, and
			some aspects of this situation will help guide future
			processing on some problem (which is recognized as
			similar).  This is much like analogy, like real
			"learning" (storing away experiences).

	What mechanisms are implicated?
		Caching
		  This should encompass also normal Compiling of subroutines.
		PDMs (as triggers or demons)
		  The analogy: you hear someone yell "Fire!" in a theater.
		  Also: mention the basis for much of humor: the punchline.
			The intentional misleading initially (wrong LHS)
			An unexpected reaction (wrong RHS)
			This may even provide a start at a taxonomy of
				humor, one which coud be operationalized.
	Relevance to learning
		Confirm or disconfirm predictors
		This requires setting up PDMs to fire on dis/confirmation
		Yes, the idea is that when an expectation is not met, we
			also (have some chance of) modifying the culprit
			rule(s) that made that false prediction (and/or
			adding new rule(s) which would make the right one.
			THis is v. similar to Mycin's task, and the TEIR.
			approach is worth mentioning.
		This ties learning in very closely to scientific theory
			formation (at least the theories of knowledge which
			are activist, revolutionary conventionalist activist,
			methodological falsificationist, sophisticated.
			[those are the nodes descending in a genl/spec hier
			of theories of knowledge]



\NSECP{COGNITIVE ECONOMY REVISITED}

[let's leave this section until last, till the others are written out]

	Sample problem:  using a world model (simulator) to answer questions
		(e.g., what'd happen if 100 bombers went in there?)
		Representation of this knowledge as PDMs at difft levels of abstn
		Ability to generalize and cache results at one level at the
			next higher level,
				e.g. either as numerical tables, stat. distns, or
				symbolic expressions
		Ability to answer some questions appropriately for the requestor
			at a high level of abstraction

	KB Design
		One good reason to use inheritanc is to speed knowledge
			implementation, not computing performance
		Using the system should result in its speedup
		Storage should be cheap
	Machine architecture
		PDI should be cheap
		PDMs should be scheduled with variable resources and
			should be able to spend effort accordingly
		How could propagation of changes be made efficient?



\NNSECP{Acknowledgements}

We wish to thank...

\vfill\end